2017-{07..08}, Josh Montague
This is Part 3 of the "Time Series Modeling in Python" series.
If you'd like to hop back to earlier sections, recall:
pandas
library that make working with time series particularly convenient.In this section, we'll look at a common pattern for modeling time series data that takes greater advantage of prior observations in the series. Specifically, we're going to look at breaking down an observed time series into components that represent underlying trends and seasonality. Additionally, we'll write and use an "STL decomposition" because it's a nice interpretable model that doesn't exist in statsmodels
and I think it should.
The general outline we'll follow is:
Consistent with earlier parts of the series, the OText content on forecasting is a great reference, and we'll follow some of the conventions there, particularly Chapter 6.5.
In [ ]:
import pandas as pd
import numpy as np
import scipy as sp
import statsmodels.api as sm
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.rc('figure', figsize=(10,8))
In the previous parts of this series, our modeling (and forecasting) efforts often ignored most of the data in our time series. Our predictions were either simply based on the last-observed point, or a single point some relatively small distance in the past.
Now, we'll consider larger-scale patterns in the time series data. Often these patterns are described in terms of a decomposition into "components." The set of components we'll refer to in this session is:
We can write out the combined model as:
y = S + T + E,
if the model is additive
or
y = S * T * E,
if the model is multiplicative.
The additive model makes sense when the magnitude of the seasonal component doesn't vary with the absolute level of the series, and the multiplicative model makes sense when that variation is present. This is often an emperical choice, and the additive model is often the default setting.
The specific implementation of both the "trend" and "seasonal" components are variable, and we'll look at a couple of examples below.
Like many libraries, statsmodels has some datasets built-in. This dataset represents daily observations of CO$_2$ in Hawaii over the span of forty years.
In [ ]:
# http://www.statsmodels.org/stable/datasets/index.html
dataset = sm.datasets.co2.load_pandas()
# special sm object ("bunch"-like cf sklearn datasets)
dataset
In [ ]:
# has a pd.dataframe in the .data attr
co2_ts = dataset.data
co2_ts.head()
In [ ]:
co2_ts.plot(title='Mauna Loa Weekly Atmospheric CO2 Data')
plt.xlabel('date')
plt.ylabel('CO2 concentration (ppmv)')
Right off the bat, we can identify some qualities of this data
Since many timeseries models don't play nicely with missing data or NaNs, let's fill in the missing data with linear interpolations.
In [ ]:
# how much missing data?
len(co2_ts.isnull())
In [ ]:
# often, ts methods/libs don't play nicely with missing data/NaNs, so
# fill missing values (recall pd timeseries methods from Part 1!)
co2_ts = (co2_ts
.resample('D')
.mean()
.interpolate('linear')
)
co2_ts.head(10)
In [ ]:
co2_ts.plot()
# if you want a close-up
#co2_ts.head(1000).plot()
In [ ]:
from statsmodels.tsa.seasonal import seasonal_decompose
In [ ]:
# sm has recently been updated to accept pd.dataframes in more places (previously all numpy arrays)
seas_decomp = seasonal_decompose(co2_ts, model='additive', freq=365)
seas_decomp
Use stl.<TAB>
above to see the attributes of this object. It's simple: a few arrays and a .plot()
method.
In [ ]:
# the DecomposeResult doesn't expose very much for you, just arrays and a mpl.figure convenience method
fig = seas_decomp.plot()
This figure highlights the goal: take the observed data (top), identify the overall trend (a low-variance version of the observations, second chart), and identify the periodic component (called "seasonal," regardless of its frequency). Then, whatever is left after subtracting these from the observed data is the residual (bottom chart). Ideally, we're left with small-amplitude, Gaussian-ish residuals. In this example, it looks like relatively random, zero-centered jitters, at a level that is a fraction of a percent of the observations. That's good!
Note: inspecting the data for most significant periodicity is important! By default, statsmodels looks inside the pandas frame, and maps the frequency of the index to a filter frequency. Sounds reasonable, but here, the daily data leads to a filter function that is "7" e.g. a weekly smoothing. However, our data doesn't have weekly periodicity so the trend ends up overfit (see charts just below). We have to look at our data observe that the major periodicity is annual, so (above) we set the frequency to 365 (number in indices given the units of the observed data).
In [ ]:
# default settings -> read the pd.datetimeindex freq (1 day), assumes weekly cycles
seas_decomp = seasonal_decompose(co2_ts)
# overfit!
seas_decomp.plot();
Recall, we have basically two goals:
The seasonal_decompose
method isn't technically an "STL" (Seasonal Trend with Lowess) decomposition because it doesn't use the Loess method (more on this soon), but it does decompose the signal into a seasonal and trend component. The statsmodels
function uses a convolution between the observed data and a filter (roughly, a square wave) defined by the specified frequency of the seasonal component.
The method will introspect a dataframe for the frequency of it's datetimeindex, but you may have to manually override it. If you have a month of daily sales data, the largest seasonal component is probably weekly. In our example, we have years of daily weather data and the biggest seasonal component is the annual one (365). You could imagine using a Fourier Transform to identify the largest seasonal component automatically, but this is left as an exercise for the reader.
So what is a convolution? Generally it's an operation that takes two functions, and produces another. Roughly, it's the integral of the pointwise multiplication of the two functions as a function of the amount that one of the original functions is translated. I always remember it best as "smearing one curve across another while dragging it from left to right." I find this figure (from the wiki page) helpful.
We don't have to go so far as writing our own convolution. This one from statsmodels is built around the relevant tools from scipy. Below, I'll follow, briefly, what the seasonal_decompose()
method is doing under the hood. Don't get too hung up on the concept of the convolution; it's just one way to smooth the observed data into a trend that happens to account for variation over a particular window size. We'll look at another example shortly.
We'll calculate a few things, then plot them all together for comparison.
In [ ]:
from statsmodels.tsa.filters.filtertools import convolution_filter
In [ ]:
# the filter size (units = # of data points)
freq = 365
# an array that is a square wave from 0 to freq
filt = np.repeat(1./freq, freq)
#filt
In [ ]:
# number of data points to include in inspection charts
nobs = 2000
In [ ]:
# convert our dataframe of observations to 1-d np.array
observed = np.asanyarray(co2_ts).squeeze()
observed[:nobs]
In [ ]:
# calculate the "trend" array by dragging filter across data
trend = convolution_filter(observed, filt)
#trend[:nobs]
In [ ]:
# subtract the trend array ("standardize" values)
detrended = observed - trend
# these values should be closer to 0
detrended[:nobs]
In [ ]:
fig, axes = plt.subplots(31, sharex=True)
# filter
plt.subplot(311)
# note: i'm artificially adding a bunch of 0s to the filter here so we can see it on the same x scale
plt.plot(np.append(filt, np.array([0]*(nobs - len(filt)))), '.-', label='filter')
plt.title("first {} observed data points".format(nobs))
plt.legend()
# trend
plt.subplot(312)
plt.plot(observed[:nobs], '.-', label='observed')
plt.plot(trend[:nobs], '.-', label='trend')
plt.legend()
# standardized / de-trended
plt.subplot(313)
plt.plot(detrended[:nobs], '.-', label='detrended')
plt.legend()
Now we use the detrended series to find the seasonal component. In statsmodels, they do this with an interesting empirical method. They calculate the mean value for reach of the freq
points at indexes with the specified periodicity. That is, the mean across the values at index 0, 365, 729, etc., the mean across the values at index 1, 366, 730, etc., and so on, until we've covered each of the freq
e.g. 365 possible means.
In [ ]:
# special handling of mean wrt NaN values
from pandas.core.nanops import nanmean as pd_nanmean
In [ ]:
# calculate mean value in each point of the seasonal cycle (defined by freq)
# e.g. each day in the annual cycle
period_averages = np.array([pd_nanmean(detrended[i::freq]) for i in range(freq)])
# center average cycle
period_averages -= np.mean(period_averages)
len(period_averages)
In [ ]:
plt.plot(period_averages, '.-', label='seasonal cycle')
plt.xlabel('point in cycle')
plt.ylabel('mean value')
plt.legend();
In [ ]:
# tile out the seasonal curve - this becomes the seasonal part of the model
seasonal = np.tile(period_averages, len(observed) // freq +1)[:len(observed)]
#len(seasonal)
In [ ]:
plt.plot(seasonal[:nobs], '.-')
In [ ]:
# the residual is what's left over after the seasonal component
# is subtracted from the detrended data
resid = detrended - seasonal
Now we have all the pieces we need to construct the seasonal_decompose
output: trend and seasonal components, and the residual measurements.
In [ ]:
# decomposition details (first `nobs` points)
fig, axes = plt.subplots(41, sharex=True)
# trend
plt.subplot(411)
plt.plot(observed[:nobs], '.-', label='observed')
plt.plot(trend[:nobs], '-', label='trend')
plt.title('decomposition on first {} points'.format(nobs))
plt.legend()
# detrended
plt.subplot(412)
plt.plot(detrended[:nobs], '-', label='detrended')
plt.legend()
# seasonal
plt.subplot(413)
plt.plot(seasonal[:nobs], '-', label='seasonal')
plt.legend()
# residual
plt.subplot(414)
plt.plot(resid[:nobs], '-', label='residual')
plt.legend()
In [ ]:
# all data points
fig, axes = plt.subplots(41, sharex=True)
# trend
plt.subplot(411)
plt.plot(observed, '-', label='observations')
plt.plot(trend, '-', label='trend')
plt.title('decomposition on all points')
plt.legend()
# detrended
plt.subplot(412)
plt.plot(detrended, '-', label='detrended')
plt.legend()
# seasonal
plt.subplot(413)
plt.plot(seasonal, '-', label='seasonal')
plt.legend()
# residuals
plt.subplot(414)
plt.plot(resid, '-', label='residual')
plt.legend()
In [ ]:
# look at the distribution of residuals ( hist() doesn't like NaNs )
plt.hist([ x for x in resid if not np.isnan(x) ], bins=100);
plt.xlabel('residual')
plt.ylabel('count')
plt.title('distribution of residuals')
If the residual distribution was really skewed, we might consider going back to our assumptions of seasonal frequency, and possibly modify our filter (in the convolutional example). At the very least, it would set up our expectations for how any predictions would over- or under-estimate real values.
The seasonal_decompose
method in statsmodels is perfectly fine, but I've long wanted statsmodels to have a proper STL decomposition. There are many libraries full of advanced methods for time series forecasting in Python (and we'll likely see some later in this series!), but not as many that are simple. I value simplicity and interpretability, both of which I think STL embodies. Looking at the functionality of R's forecast
library can be a little depressing for Pythoneers in this regard (Hyndman is prolific).
So, if it doesn't exist? Let's just make it!
The STL (Seasonal-Trend with Lowess) decomposition was published in 1990 by Cleveland, as an application of the Lowess (locally weighted scatterplot smoothing) method published by the same author in 1979. statsmodels doesn't have an out-of-the-box STL decomp method, but it does have an implementation of the Lowess smoother.
Lowess is generally a super handy way of highlighting trends in noisy data. It's an underutilized tool (I think) that also shows up in e.g. ggplot, and now seaborn. Before we use it in a decomposition, let's make sure we understand what it's doing by itself...
In [ ]:
lowess = sm.nonparametric.lowess
In [ ]:
# make some noisy data
nobs = 200
x = np.linspace(0,nobs-1,nobs)
# sine-ish
#y = 0.02*x + np.sin(0.25*x) + np.random.normal(scale=0.25, size=len(x))
# linear-ish
y = 0.02*x + np.random.normal(scale=0.75, size=len(x))
In [ ]:
plt.plot(y, 'o')
Unfortunately, statsmodels has a different API philosophy than, say, sklearn. So the (exog, endog)
function signature can take some getting used to if you're used to seeing the (X, y)
signature pattern. But, ultimately, they just take some time to translate in your mind.
There are a couple of parameters in a Lowess model, but the only one that affects the fitting is frac
which controls how much adjacent data to use in calculating any one point in the smoothed curve. frac is the fraction (of the entire array of data) adjacent to the relevant point that is used.
The regression is locally weighted - meaning the importance of some y values is modified - by a function that has a form roughly between a short-tailed Gaussian and a rounded square wave.
Let's compare output, varying the settings.
In [ ]:
# default frac=0.66
z = lowess(y, x, return_sorted=False)
# compare to a couple extreme values of frac
frac1 = 0.1
frac2 = 0.9
w = lowess(y, x, frac=frac1, return_sorted=False)
v = lowess(y, x, frac=frac2, return_sorted=False)
In [ ]:
plt.plot(y, '.', label='obs')
plt.plot(z, '--', label='frac=0.6 (default)')
plt.plot(w, '--', label='frac={}'.format(frac1))
plt.plot(v, '--', label='frac={}'.format(frac2))
plt.legend()
Note: the importance of the frac
kwarg changes with the number of data points. Small N (total observations) means a much higher variance model than large N for the same frac
value.
This time, I'll skip the drawn out version we used previously and drop the Lowess smoothing in as the trend calculation in a decomposition. I'm also going to import some utilities directly from statsmodels. We'll talk through them, and if you're curious for more, you can poke around in the statsmodels source code.
In [ ]:
def stl_decompose(df, freq=365, lo_frac=0.6, lo_delta=0.0):
"""
An STL decomposition of time series data.
This implementation is modeled after the statsmodels seasonal_decompose method
but substitutes Lowess for convolution in its trend estimation.
Parameters
----------
df : pd.Dataframe
Time series of counts with a DatetimeIndex.
freq: int, optional
Most significant periodicity in the time series.
lo_frac: float, optional
Fraction of data to use in Lowess smoothing
lo_delta: float, optional
Fractional distance within which to use linear-interpolation
instead of weighted regression. Significantly affects
computation time!
Returns
-------
results : obj
A object with seasonal, trend, and resid attributes.
Notes
-----
This is an additive model, Y[t] = T[t] + S[t] + e[t]
"""
# use some existing pieces of statsmodels
from statsmodels.tsa.seasonal import DecomposeResult
from statsmodels.tsa.filters._utils import _maybe_get_pandas_wrapper_freq
import statsmodels.api as sm
from pandas.core.nanops import nanmean as pd_nanmean
lowess = sm.nonparametric.lowess
_pandas_wrapper, _ = _maybe_get_pandas_wrapper_freq(df)
# get plain np array
observed = np.asanyarray(df).squeeze()
# calc trend, remove from observation
trend = lowess(observed, [x for x in range(len(observed))],
frac=lo_frac,
delta=lo_delta*len(observed),
return_sorted=False)
detrended = observed - trend
# calc one-period seasonality, remove tiled array from detrended
period_averages = np.array([pd_nanmean(detrended[i::freq]) for i in range(freq)])
# 0-center the period avgs
period_averages -= np.mean(period_averages)
seasonal = np.tile(period_averages, len(observed) // freq + 1)[:len(observed)]
resid = detrended - seasonal
# convert the arrays back to appropriate dataframes, stuff them back into
# the statsmodel object
results = list(map(_pandas_wrapper, [seasonal, trend, resid, observed]))
dr = DecomposeResult(seasonal=results[0],
trend=results[1],
resid=results[2],
observed=results[3],
period_averages=period_averages)
return dr
In [ ]:
co2_ts.head()
In [ ]:
# use just like the seasonal_decompose method
co_stl = stl_decompose(co2_ts, freq=365, lo_frac=0.1)
In [ ]:
# we inherit a .plot() method from DecomposeResult
co_stl.plot();
In [ ]:
# or make your own custom figure with the internal df attributes
fig, axes = plt.subplots(41)
# obs + trend
plt.subplot(411)
plt.plot(co_stl.observed, '-', label='x')
plt.plot(co_stl.trend, '-', label='Lowess trend')
plt.legend()
# seasonal
plt.subplot(412)
plt.plot(co_stl.seasonal, '-', label='seasonal')
plt.legend()
# residuals
plt.subplot(413)
plt.plot(co_stl.resid, '-', label='residual')
plt.legend()
# resid hist
plt.subplot(414)
plt.hist(co_stl.resid.values, bins=100)
plt.xlabel('residual')
plt.ylabel('count')
We can also pass through an extra kwarg that the Lowess method supports which allows us to not fit a regression for every point, but to linearly interpolate any point that's within some distance of the last one calculated. This lets us trade some accuracy for less runtime.
In [ ]:
# this will take ~1-2 minutes
%timeit -n3 -r3 stl_decompose(co2_ts, lo_frac=0.25)
In [ ]:
# don't recalc regression w/in 1% windows
%timeit -n3 -r3 stl_decompose(co2_ts, lo_frac=0.25, lo_delta=0.01)
Now that we have a decomposition, it'd be great if we could project it forward. Continuing to follow the methods as before, we can consider the components of the decomposition:
y = S + T + E
,
forecast them separately and then combine them appropriately. For now, we're only going to consider additive models, and we're not going to incorporate E
(maybe later!).
We'll start by forecasting the trend component. In the previous session, we defined a set of simple one-point-ahead forecasting methods that we can reuse here. We can import them from the module included in this repo.
The workflow we aim for is:
In [ ]:
# some methods from earlier RST
from rst_timeseries import fc_naive, fc_drift
In [ ]:
def forecast_stl(stl, steps=10, seasonal=False, fc_func=None, **fc_func_kwargs):
"""
Create an (additive) forecast, using the given forecasting function,
the specified number of steps forward, including optional seasonality,
given an STL decomposition of an observed time series.
Parameters
----------
stl : an extended statsmodels.tsa.seasonal.DecomposeResult
STL decomposition of observed time series
steps: int, optional
Number of forward steps to include in the forecast
seasonal: boolean, optional
Include seasonal component in forecast
fc_func: function
Forecasting function which takes an array of observations and returns a single
valued forecast for the next point
fc_func_kwargs: keyword arguments
All remaining arguments are passed to the forecasting function fc_func
Returns
-------
forecast_frame : pd.DataFrame
A pandas dataframe with timeseries index corresponding to the specified
forecast duration
Notes
-----
This is an additive model, Y[t] = T[t] + S[t] + e[t]
"""
## container for forecast values
forecast_array = np.array([])
## forecast trend
# unpack precalculated trend array stl frame
#trend_array = np.asanyarray(stl.trend).squeeze()
trend_array = stl.trend
# iteratively forecast trend ("seasonally adjusted") component
for step in range(steps):
# make this prediction on all available data
pred = fc_func(np.append(trend_array, forecast_array), **fc_func_kwargs)
# add this prediction to current array
forecast_array = np.append(forecast_array, pred)
# forecast index starts one unit beyond observed series
ix_start = stl.observed.index[-1] + pd.Timedelta(1, stl.observed.index.freqstr)
forecast_idx = pd.DatetimeIndex(freq=stl.observed.index.freqstr,
start=ix_start,
periods=steps)
## (optionally) forecast seasonal & combine
if seasonal:
# track index and value of max correlation
s_ix = 0
max_correlation = -np.inf
# loop over indexes=length of period avgs
detrended_array = np.asanyarray(stl.observed - stl.trend).squeeze()
for i,x in enumerate(stl.period_averages):
# work slices backward from end of detrended observations
if i == 0:
# slicing w/ [x:-0] doesn't work
detrended_slice = detrended_array[-len(stl.period_averages):]
else:
detrended_slice = detrended_array[-(len(stl.period_averages) + i) : -i]
# calculate corr b/w period_avgs and detrend_slice
this_correlation = np.correlate(detrended_slice, stl.period_averages)[0]
if this_correlation > max_correlation:
# update ix and max correlation
max_correlation = this_correlation
s_ix = i
# roll seasonal signal to matching phase
rolled_period_averages = np.roll(stl.period_averages, -s_ix)
# tile as many time as needed to reach "steps", then truncate
tiled_averages = np.tile(rolled_period_averages,
(steps // len(stl.period_averages) + 1))[:steps]
# add seasonal values to previous forecast
forecast_array += tiled_averages
# combine data array with index into named dataframe
forecast_frame = pd.DataFrame(data=forecast_array, index=forecast_idx)
forecast_frame.columns = [fc_func.__name__]
return forecast_frame
To demonstrate how this works, let's start by using the same dataset, but assume that we have only observed a fraction of it when we want to make our forecast.
In [ ]:
# take a fraction to forecast
co2_ts_short = co2_ts.head(10000)
plt.plot(co2_ts, '--', label='full')
plt.plot(co2_ts_short, '-', label='short')
plt.legend();
First, we'll fit the STL model to our observed data set, and use the same daily frequency as before, plus some of the Lowess settings we looked at for efficiency.
In [ ]:
# model the abbreviated series with our STL decomposition
stl = stl_decompose(co2_ts_short, freq=365, lo_frac=0.25, lo_delta=0.01)
# note the last date of the observed frame
# and compare to first date in forecast frame
co2_ts_short.tail()
Note the final data point in our observed (and modeled) time series. Our forecast should pick up right after this point.
We'll forecast out model out 1000 steps, which will cover a few periods of our data. We'll also use the naive "drift" forecast. Recall from the last session that this makes it's next-point prediction based on a linear fit of the last observed points and the point n
steps back (default n=3
).
In [ ]:
# now forecast forward `steps` values, with optional seasonality, and
# using one of our imported forecasting functions
fc_df = forecast_stl(stl, steps=1000, seasonal=True, fc_func=fc_drift)
fc_df.head()
In [ ]:
# compare the forecast to our observations, ground truth, and the modeled trend
plt.plot(co2_ts, ':', label='ground truth')
plt.plot(co2_ts_short.tail(12000), label='obs')
plt.plot(stl.trend, '--', label='stl.trend')
plt.plot(fc_df, '.-', label='forecast')
# zoom way in on the relevant region
#plt.xlim('1980','1993'); plt.ylim(330,370);
plt.legend();
Just to be sure it works, let's try it on another time series...
The small file boulder.csv
included in misc/
is an approximate daily count of the appearances of "boulder" on Twitter between April and mid-August.
In [ ]:
bdr_counts = (pd.read_csv('misc/boulder.csv', parse_dates=[0])
.set_index('dt')
)
bdr_counts.head()
As before, we should fill NaNs, but in this demo there aren't actually any NaNs. However, this is still important because the resample method also ensures that our DatetimeIndex also has the right freq
attribute we need in our STL method.
In [ ]:
# fill NaNs (in this case, but also ensure that the df has freq='D' set
bdr_counts = (bdr_counts
.resample('D')
.mean()
.interpolate('linear')
)
#bdr_counts.index
In [ ]:
bdr_counts.plot()
plt.ylabel('daily "boulder" counts')
In [ ]:
len(bdr_counts)
In [ ]:
bdr_short = bdr_counts.head(100)
In [ ]:
# it seems the weekly cycle is biggest and data is daily, so freq=7
stl = stl_decompose(bdr_short, freq=7, lo_frac=0.4, lo_delta=0.01)
In [ ]:
stl.plot();
In [ ]:
stl.resid.hist(bins=50);
These residuals are certainly larger than the ones we had with our first data set.
In [ ]:
bdr_fc = forecast_stl(stl, steps=50, seasonal=True, fc_func=fc_drift)
# plot
plt.plot(bdr_counts, '--o', label='truth')
plt.plot(bdr_short, '--o', label='observed')
plt.plot(bdr_fc, '-o', label='forecast')
plt.plot(stl.trend, ':', label='trend')
plt.ylabel('daily "boulder" counts')
plt.xlabel('date')
plt.legend();
# zoom into forecast region
#plt.xlim('2017-05-15','2017-09'); plt.ylim(1500,3500)
Not bad!
That definitely isn't as beautiful and obvious as the first data set. But it's also a good example of where the seasonal component is not the overwhelmingly dominant feature. In this case, the daily (and weekly) fluctuations are of the same size (or bigger than) the modelled seasonal component.
Another thing to observe with this model is the effect of the lo_frac
term - that is, how narrow the bandwidth is used in the Lowesss trend. If you turn it down to ~0.3, you'll see the last few points trending up and right, leading the model (which, recall is a linear extrapolation of the fit Lowess trend) to diverge quite a lot from the observed data.
So, sadly (but hopefully not surprisingly) there is no silver bullet! But, with some tuning and observation, this is one relatively straightforward method of forecasting a time series.
In summary, STL decompositions are pretty handy, and relatively efficient.
Pro
Con
In [ ]: